pacman::p_load(ggiraph, plotly,
patchwork, DT, tidyverse)Hands-on Exercise 3
Programming Interactive Data Visualisation and Animated Statistical Graphics with R
Summary
This hands-on exercise consist of two main topic, namely:
1 Programming Interactive Data Visualisation with R
1.1 Loading R packages
R packages for Interactive Data
- ggiraph :making ‘ggplot’ graphics interactive.
- plotly : R library for plotting interactive statistical graphs.
- DT : provides an R interface to the JavaScript library DataTables that create interactive table on html page.
- tidyverse : a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
- patchwork : combining multiple ggplot2 graphs into one figure.
1.2 Importing the Data
exam_data <- read_csv("data/Exam_data.csv")1.3 Overview of the data
exam_data# A tibble: 322 × 7
ID CLASS GENDER RACE ENGLISH MATHS SCIENCE
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Student321 3I Male Malay 21 9 15
2 Student305 3I Female Malay 24 22 16
3 Student289 3H Male Chinese 26 16 16
4 Student227 3F Male Chinese 27 77 31
5 Student318 3I Male Malay 27 11 25
6 Student306 3I Female Malay 31 16 16
7 Student313 3I Male Chinese 31 21 25
8 Student316 3I Male Malay 31 18 27
9 Student312 3I Male Malay 33 19 15
10 Student297 3H Male Indian 34 49 37
# ℹ 312 more rows
1.4 Interactive Data Visualisation - ggiraph methods
ggiraph makes ‘ggplot’ graphics interactive with these arguments.
1.4.1 Tooltip effect with tooltip aesthetic
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive( # Create basic graph
aes(tooltip = ID), # specify tooltip here
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p, # generate svg object on an html page.
width_svg = 6,
height_svg = 6*0.618)Hover over a dot to check out the student’s ID
Displaying multiple information on tooltip
# create a new field called tooltip with desired data
exam_data$tooltip <- c(paste0("Name = ", exam_data$ID,
"\n Class = ", exam_data$CLASS,
"\n Gender = ", exam_data$GENDER))
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = exam_data$tooltip), # newly created field used as tooltip field
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 8,
height_svg = 8*0.618)Hover over a dot. Now, more information is shown!
Customising Tooltip style
Code chunk below uses opts_tooltip() of ggiraph to customize tooltip rendering by add css declarations.
tooltip_css <-
"background-color:grey;
font-style:bold;
color:black;
font-size: 1.2em" # customise tooltip css
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list(opts_tooltip(css = tooltip_css))) # add the tooltip_css here background colour of the tooltip is grey and the font colour is white and bold.
Displaying statistics on tooltip
Using stat_summary(), a function is used to compute 90% confident interval of the mean. The derived statistics are then displayed in the tooltip.
tooltip <- function(y, ymax, accuracy = .01)
{mean <- scales::number(y, accuracy = accuracy)
sem <- scales::number(ymax - y, accuracy = accuracy)
paste("Mean maths scores:", mean, "+/-", sem)}
gg_point <- ggplot(data=exam_data,
aes(x = RACE),) +
stat_summary(aes(y = MATHS,
tooltip = after_stat(tooltip(y, ymax))), # adding tool tip
fun.data = "mean_se",
geom = GeomInteractiveCol,
fill = "light blue") +
stat_summary(aes(y = MATHS), # adding error bar
fun.data = mean_se,
geom = "errorbar", width = 0.1, size = 0.2) +
theme_minimal()
girafe(ggobj = gg_point,
width_svg = 8,
height_svg = 8*0.618)1.4.2 Hover effect with data_id
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(data_id = CLASS), # specify data_id here
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 6,
height_svg = 6*0.618) Elements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. The default color is orange.
Styling hover effect
Customize highlighting effect
- using
opts_hover()for effect on geometries - using
opts_hover_invfor effect on other geometries
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(data_id = CLASS), # specify data_id here
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list(opts_hover(css = "fill: blue;"), # effect on geometries
opts_hover_inv(css = "opacity:0.2;"))) # effect on other geometriesDifferent from previous example, in this example the ccs customisation request are encoded directly.
1.4.3 Combining tooltip and hover effect
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = CLASS, # specify tooltip here
data_id = CLASS), # specify data_id here
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 6,
height_svg = 6*0.618,
options = list(opts_hover(css = "fill: blue;"), # effect on geometries
opts_hover_inv(css = "opacity:0.2;"))) # effect on other geometriesElements associated with a data_id (i.e CLASS) will be highlighted upon mouse over. At the same time, the tooltip will show the CLASS.
1.4.4 Coordinated Multiple Views
p1 <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(tooltip = CLASS,
data_id = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
coord_cartesian(xlim=c(0,100)) +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
p2 <- ggplot(data=exam_data,
aes(x = ENGLISH)) +
geom_dotplot_interactive(
aes(tooltip = CLASS,
data_id = ID),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
coord_cartesian(xlim=c(0,100)) +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(code = print(p1 / p2),
width_svg = 6,
height_svg = 6,
options = list(
opts_hover(css = "fill: #202020;"),
opts_hover_inv(css = "opacity:0.2;")))Notice that when a data point of one of the dotplot is selected, the corresponding data point ID on the second data visualisation will be highlighted too.
The data_id aesthetic is critical to link observations between plots and the tooltip aesthetic is optional but nice to have when mouse over a point.
1.4.5 Click effect with onclick
onclick argument of ggiraph provides hotlink interactivity on the web.
exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school",
as.character(exam_data$ID))
p <- ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot_interactive(
aes(onclick = onclick),
stackgroups = TRUE,
binwidth = 1,
method = "histodot") +
scale_y_continuous(NULL,
breaks = NULL) +
theme_minimal()
girafe(ggobj = p,
width_svg = 6,
height_svg = 6*0.618) Web document link with a data object will be displayed on the web browser upon mouse click.
1.5 Interactive Data Visualisation - plotly methods
There are two ways to create interactive graph by using plotly, they are:
- by using
plot_ly() - by using
ggplotly()
1.5.1 Using plot_ly()
plot_ly(data = exam_data,
x = ~MATHS,
y = ~ENGLISH)plot_ly(data = exam_data,
x = ~ENGLISH,
y = ~MATHS,
color = ~RACE)1.5.2 Using ggplotly()
- Appropriate ggplot2 functions are used to create a scatter plot.
ggplotly()is used to convert the R graphic object into interactive object.
p <- ggplot(data=exam_data,
aes(x = MATHS,
y = ENGLISH)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
ggplotly(p) # add this line1.5.3 Coordinated Multiple Views with ggplotly()
Three steps for creating coordinated linked plot:
highlight_key()of plotly package is used as shared data.- two scatterplots will be created by using ggplot2 functions.
subplot()of plotly package is used to place them next to each other side-by-side.
d <- highlight_key(exam_data) # Step 1
p1 <- ggplot(data=d, # Step 2
aes(x = MATHS,
y = ENGLISH)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
p2 <- ggplot(data=d,
aes(x = MATHS,
y = SCIENCE)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
subplot(ggplotly(p1), # Step 3
ggplotly(p2))1.6 Interactive Data Visualisation - crosstalk methods
- Crosstalk is an add-on to the htmlwidgets package.
- It extends htmlwidgets with a set of classes, functions, and conventions for implementing cross-widget interactions (currently, linked brushing and filtering).
1.6.1 Interactive Data Table: DT package
DT package allow rendering of data objects as HTML tables.
DT::datatable(exam_data[c("ID","CLASS","GENDER","RACE","ENGLISH","MATHS","SCIENCE")], class= "compact")1.6.2 Linked brushing: crosstalk method
d <- highlight_key(exam_data)
p <- ggplot(d,
aes(ENGLISH,
MATHS)) +
geom_point(size=1) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
gg <- highlight(ggplotly(p),
"plotly_selected")
crosstalk::bscols(gg,
DT::datatable(d),
widths = 5) Things to learn from the code chunk:
highlight() is a function of plotly package. It sets a variety of options for brushing (i.e., highlighting) multiple plots. These options are primarily designed for linking multiple plotly graphs, and may not behave as expected when linking plotly to another htmlwidget package via crosstalk. In some cases, other htmlwidgets will respect these options, such as persistent selection in leaflet.
bscols() is a helper function of crosstalk package. It makes it easy to put HTML elements side by side. It can be called directly from the console but is especially designed to work in an R Markdown document. Warning: This will bring in all of Bootstrap!.
2 Programming Animated Statistical Graphics with R
2.1 Loading R packages
pacman::p_load(readxl, gifski, gapminder,
plotly, gganimate, tidyverse)R packages for Animated plot
- plotly : plotting interactive statistical graphs.
- gganimate : creating animated statistical graphs.
- gifski : converts video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.
- gapminder: An excerpt of the data available at Gapminder.org. We just want to use its country_colors scheme.
- tidyverse : a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
2.2 Importing the Data
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
sheet="Data") %>%
# change "Country" and "Continent" (aka col) as factor
mutate(across(col, as.factor)) %>%
# change "Year" as integer
mutate(Year = as.integer(Year)) Alternatively, use mutate_all()
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
sheet="Data") %>%
# change "Country" and "Continent" (aka col) as factor
mutate_at(col, as.factor) %>%
# change "Year" as integer
mutate(Year = as.integer(Year)) Things to learn from the code chunk above
read_xls()of readxl package is used to import the Excel worksheet.mutate()of dplyr package is used to create new columns or modify columns that are functions of existing variables.across()apply the same functions to multiple columnsmutate_at()convert all character data type columns into factor.
2.3 Overview of the data
globalPop# A tibble: 6,204 × 6
Country Year Young Old Population Continent
<fct> <int> <dbl> <dbl> <dbl> <fct>
1 Afghanistan 1996 83.6 4.5 21560. Asia
2 Afghanistan 1998 84.1 4.5 22913. Asia
3 Afghanistan 2000 84.6 4.5 23898. Asia
4 Afghanistan 2002 85.1 4.5 25268. Asia
5 Afghanistan 2004 84.5 4.5 28514. Asia
6 Afghanistan 2006 84.3 4.6 31057 Asia
7 Afghanistan 2008 84.1 4.6 32738. Asia
8 Afghanistan 2010 83.7 4.6 34505. Asia
9 Afghanistan 2012 82.9 4.6 36416. Asia
10 Afghanistan 2014 82.1 4.7 38327. Asia
# ℹ 6,194 more rows
2.4 Animated Data Visualisation: gganimate methods
gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time.
Sample Syntax
transition_*()defines how the data should be spread out and how it relates to itself across time.view_*()defines how the positional scales should change along the animation.shadow_*()defines how data from other points in time should be presented in the given point in time.enter_*()/exit_*()defines how new data should appear and how old data should disappear during the course of the animation.ease_aes()defines how different aesthetics should be eased during transitions.
ggplot(globalPop, aes(x = Old, y = Young,
size = Population, # the size of dot depends on population
colour = Country)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(x = '% Aged',
y = '% Young') 
ggplot(globalPop, aes(x = Old, y = Young,
size = Population,
colour = Country)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(title = 'Year: {frame_time}',
x = '% Aged',
y = '% Young') +
transition_time(Year) + # add this line
ease_aes('linear') # and this line
For animated plot:
transition_time()of gganimate is used to create transition through distinct states in time (i.e. Year).ease_aes()is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.
2.5 Animated Data Visualisation: plotly
2.5.1 Using plot_ly()
bp <- globalPop %>%
plot_ly(x = ~Old,
y = ~Young,
size = ~Population,
color = ~Continent,
sizes = c(2, 100),
frame = ~Year,
text = ~Country,
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
) %>%
layout(showlegend = FALSE)
bp2.5.2 Using ggplotly()
- Appropriate ggplot2 functions are used to create a static bubble plot.
- The output is then saved as an R object called gg.
ggplotly()is used to convert the R graphic object into an animated svg object.
gg <- ggplot(globalPop,
aes(x = Old,
y = Young,
size = Population,
colour = Country)) +
geom_point(aes(size = Population,
frame = Year),
alpha = 0.7,
show.legend = FALSE) + # this doesn't work
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(x = '% Aged',
y = '% Young')
ggplotly(gg)gg <- ggplot(globalPop,
aes(x = Old,
y = Young,
size = Population,
colour = Country)) +
geom_point(aes(size = Population,
frame = Year),
alpha = 0.7) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
labs(x = '% Aged',
y = '% Young') +
theme(legend.position='none') # use this instead
ggplotly(gg)Things to learn from the code chunk above
- although
show.legend = FALSE argumentwas used, the legend still appears on the plot. - To overcome this problem,
theme(legend.position='none')should be used.
3 Reference
Kam, T.S.(2023) Programming Interactive Data Visualisation with R
Kam, T.S.(2023) Programming Animated Statistical Graphics with R